0%

最近在研究Ceph的监控系统,Ceph 14.2.1的MGR中已经有Prometheus Plugin,可以替代原来的ceph_exporter项目,作为新的exporter。MGR中的Prometheus Plugin可以对Ceph的一些状态进行监控(大概390+个监控指标),基本涵盖了能想到的还有想不到的监控指标,打开了认识Ceph的另一个窗口。enable起这个Plugin并不是难点,难点在怎样从中捞取有价值的数据,这是需要长期积累持续挖掘的问题,你懂得。。。。。。说了一堆XXX,这里只做一个prometheus与alertmanager集成的入门实验。

prometheus与alertmanager是相互独立的组件,我们先来说一下它们的职责:

  • prometheus端负责产生告警,并发送到alertmanager端

    需要配置alertmanager服务在哪、告警产生的条件(alert rule)

  • alertmanager端负责接收prometheus发来的告警,然后做一些后续处理(比如把告警信息发出到email、微信、钉钉……)

    需要配置route、receivers

假设有这样一个场景:Prometheus已经收集到了Ceph的监控指标,如果集群中有OSD down超过1小时,那么我们需要发送邮件给相关人员,告知需要处理。

ansible

命令模式

如何查看模块帮助

1
2
3
4
5
列出所有模块
ansible-doc -l

查看指定模块帮助
ansible-doc -s MODULE_NAME

ansible命令应用基础

1
2
3
4
5
6
7
8
语法(新版)
ansible <host-pattern> [options]

语法(旧版)
ansible <host-pattern> [-f forks] [-m module_name] [-a args]
-f forks:启动的并发线程数
-m module_name:要使用的模块
-a args:模块特有的参数

常见模块

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
command:命令模块,默认模块,用于在远程执行命令
例如:
ansible all -a 'date'
ansible all -m command -a 'date'

cron:
state:
present:安装
absent:移除
例如:
ansible webserver -m cron -a 'minute="*/10" job="/bin/echo hello" name="test cron job"'
ansible webserver -a 'crontab -l'
ansible webserver -m cron -a 'minute="*/10" job="/bin/echo hello" name="test cron job" state=absent'

user:
name=:指明创建的用户的名字
例如:
ansible all -m user -a 'name="user1"'
ansible all -m user -a 'name="user1" state=absent'

group:
例如:
ansible webserver -m group -a 'name=mysql gid=306 system=yes'
ansible webserver -m user -a 'name=mysql uid=306 system=yes group=mysql'

copy:
src=:定义本地源文件路径
dest=:定义远程目标文件路径
content=:取代src=,表示直接用此处指定的信息生成为目标文件内容
例如:
ansible all -m copy -a 'src=/etc/fstab dest=/tmp/fstab.ansible owner=root mode=640'
ansible all -m copy -a 'content="Hello Ansible\nHi MageEdu" dest=/tmp/test.ansible'

file:设定文件属性
path=:指定文件路径,可以使用name或dest来替换
创建文件的符号链接:
src=:指明源文件
path=:指明符号链接文件路径
例如:
ansible all -m file -a 'owner=mysql group=mysql mode=644 path=/tmp/fstab.ansible'
ansible all -m file -a 'path=/tmp/fstab.link src=/tmp/fstab.ansible state=link'

ping:测试指定主机是否能连接
例如:
ansible all -m ping

service:指定服务运行状态
enabled=:是否开机自动启动,取值为true或者false
name=:服务名称
state=:状态,取值有started,stopped,restarted
例如:
ansible webserver -a 'service httpd status'
ansible webserver -a 'chkconfig --list httpd'
ansible webserver -m service -a 'enabled=true name=httpd state=started'

shell:在远程主机上运行命令,尤其是用到管道等功能的复杂命令
例如:
ansible all -m user -a 'name=user1'
ansible all -m command -a 'echo mageedu | passwd --stdin user1' (command模块|管道符无法送过去)
ansible all -m shell -a 'echo mageedu | passwd --stdin user1' (有管道或变量最好使用shell模块)

script:将本地脚本复制到远程主机并运行
注意:要使用相对路径指定脚本
例如:
ansible all -m script -a "test.sh"

yum:安装程序包
name=:指明要安装的程序包,可以带上版本号
state=:present,latest表示安装,absent表示卸载
例如:
ansible all -m yum -a "name=vim"

setup:收集远程主机的facts
每个被管理节点在接收并运行管理命令之前,会将自己主机相关信息,如操作系统版本、IP地址等报告给远程的ansible主机
ansible all -m setup

ansible-playbook预备知识

YMAL语法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
YAML的语法和其他高阶语言类似,并且可以简单表达清单、散列表、标量等数据结构。其结构(Structure)通过空格来展示,序列(Sequence)里的项用"-"来代表,Map里的键值对用":"分隔。下面是一个示例。
name: John Smith
age: 41
gender: Male
spouse:
name: Jane Smith
age: 37
gender: Female
children:
- name: Jimmy Smith
age: 17
gender: Male
- name: Jenny Smith
age: 13
gender: Female

YMAL常见的数据类型

list 列表的所有元素均使用”-“开头,例如:

1
2
3
4
5
# A list of stasty fruits
- Apple
- Orange
- Strawberry
- Mango

dictionary 字典通过key与value进行标识,例如:

1
2
3
4
5
6
7
8
9
# An employee record
name: Example Developer
job: Developer
skill: Elite

也可以将key: value放置于{}中进行表示,例如:
---
# An employee record
{name: Example Developer, job: Developer, skill: Elite}

基础元素

变量

  • 变量命名

变量名仅能由字母、数字和下划线组成,且只能以字母开头。

  • facts

facts是由正在通信的远程目标主机发回的信息,这些信息被保存在ansible变量中,要获取指定的远程主机所支持的所有facts,可以使用命令:

1
ansible hostname -m setup
  • register

把任务的输出定义为变量,然后用于其他任务,示例如下:

1
2
3
4
tasks:
- shell: /usr/bin/foo
register: foo_result
ignore_errors: True
  • 通过命令行传递变量

在运行playbook的时候也可以传递一些变量供playbook使用,示例如下:

1
ansible-playbook test.yml --extra-vars "hosts=www user=mageedu"
  • 通过roles传递变量

当给一个主机应用角色的时候可以传递变量,然后在角色内使用这些变量,示例如下:

1
2
3
4
- hosts: webservers
roles:
- common
- { role: foo_app_instance, dir: '/web/htdocs/a.com', port: 8080 }

Inventory

ansible主要用在批量主机操作,为了便携的使用其中的部分主机,可以在inventory file中将其分组命名。默认的inventory file为/etc/ansible/hosts。

inventory file可以有多个,且也可以通过Dynamic Inventory来动态生成。

  • inventory文件格式

inventory文件遵循INI文件风格,中括号中的字符为组名。可以将同一个主机同时归并到多个不同的组中。此外,如果目标主机使用了非默认的SSH端口,还可以在主机名称之后使用冒号加端口号来标明。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[webservers]
www1.magedu.com:2222
www2.magedu.com

[dbservers]
db1.magedu.com
db2.magedu.com
db3.magedu.com

如果主机名称遵循相似的命名模式,还可以使用列表的方式标识各主机,例如:
[webservers]
www[01:50].example.com

[databases]
db-[a:f].example.com
  • 主机变量

可以在inventory中定义主机时为其添加主机变量以便于在playbook中使用,例如:

1
2
3
[webservers]
www1.magedu.com http_port=80 maxRequestsPerChild=808
www2.magedu.com http_port=8080 maxRequestsPerChild=909
  • 组变量

组变量是赋予给指定组内所有主机上的在playbook中可用的变量。例如:

1
2
3
4
5
6
7
[webservers]
www1.magedu.com
www2.magedu.com

[webservers:vars]
ntp_server=ntp.magedu.com
nfs_server=nfs.magedu.com
  • 组嵌套

inventory中,组还可以包含其他的组,并且也可以向组中的主机指定变量。不过,这些变量只能在ansible-playbook中使用,而ansible不支持。例如:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[apache]
httpd1.magedu.com
httpd2.magedu.com

[nginx]
ngx1.magedu.com
ngx2.magedu.com

[webservers:children]
apache
nginx

[webservers:vars]
ntp_server=ntp.magedu.com
  • inventory参数

ansible基于ssh连接inventory中指定的远程主机时,还可以通过参数指定其交互方式,https://docs.ansible.com/ansible/latest/user_guide/intro_inventory.html

ansible-playbook

playbook是由一个或多个play组成的列表,play的主要功能在于将事先归并为一组的主机装扮成事先通过ansible中的task定义好的角色。从根本上来讲,所谓task无非是调用ansible的一个module。将多个play组织在一个playbook中,即可以让它们连同起来按事先编排的机制同唱一台大戏。下面是一个简单示例。

1
2
3
4
5
6
7
8
9
10
11
12
13
- host: webnodes
vars:
http_port: 80
max_clients: 256
remote_user: root
tasks:
- name: ensure apache is at the latest version
yum: name=httpd state=latest
- name: ensure apache is running
service: name=httpd state=started
handlers:
- name: restart apache
service: name=httpd state=restarted

NAME

yum - Yellowdog Updater Modified

SYNOPSIS

yum [options] [command] [package …]

DESCRIPTION

yum是一个基于rpm的交互式package manager。 它可以自动执行系统更新,包括依赖分析和基于repository元数据的过时处理。 它还可以安装新软件包,删除旧软件包以及对已安装的或者可用软件包执行查询。yum类似于apt-get和smart等其他高级包管理器。

command是以下之一:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
* install package1 [package2] [...]
* update [package1] [package2] [...]
* update-to [package1] [package2] [...]
* update-minimal [package1] [package2] [...]
* check-update
* upgrade [package1] [package2] [...]
* upgrade-to [package1] [package2] [...]
* distribution-synchronization [package1] [package2] [...]
* remove | erase package1 [package2] [...]
* autoremove [package1] [...]
* list [...]
* info [...]
* provides | whatprovides feature1 [feature2] [...]
* clean [ packages | metadata | expire-cache | rpmdb | plugins | all ]
* makecache [fast]
* groups [...]
* search string1 [string2] [...]
* shell [filename]
* resolvedep dep1 [dep2] [...]
(maintained for legacy reasons only - use repoquery or yum provides)
* localinstall rpmfile1 [rpmfile2] [...]
(maintained for legacy reasons only - use install)
* localupdate rpmfile1 [rpmfile2] [...]
(maintained for legacy reasons only - use update)
* reinstall package1 [package2] [...]
* downgrade package1 [package2] [...]
* deplist package1 [package2] [...]
* repolist [all|enabled|disabled]
* repoinfo [all|enabled|disabled]
* repository-packages <enabled-repoid> <install|remove|remove-or-reinstall|remove-or-distribution-synchronization> [package2] [...]
* version [ all | installed | available | group-* | nogroups* | grouplist | groupinfo ]
* history [info|list|packages-list|packages-info|summary|addon-info|redo|undo|rollback|new|sync|stats]
* load-transaction [txfile]
* updateinfo [summary | list | info | remove-pkgs-ts | exclude-updates | exclude-all | check-running-kernel]
* fssnapshot [summary | list | have-space | create | delete]
* fs [filters | refilter | refilter-cleanup | du]
* check
* help [command]

除非给出--help或-h选项,否则必须存在上述命令之一。

常用的命令如下:
install
用于安装最新版本的软件包或软件包组,同时确保满足所有依赖项。

update
如果在没有指定任何包的情况下,update将更新所有当前安装的包。如果指定了一个或多个包或包,则Yum将仅更新列出的包。在更新包时,yum将确保满足所有依赖关系。

update-to
此命令的作用类似于“update”,但是要指定包的版本。

check-update
检查是否有需要更新的包。

remove or erase
用于从系统中删除指定的包。

autoremove
用于从系统中删除指定的包以及依赖。

list
用于列出有关可用包的各种信息。

provides or whatprovides
用于找出哪个包提供某些功能或文件。

search
当您对包知道一些但不确定它的名称时,这用于查找包。 默认情况下,搜索将尝试仅搜索包名称和摘要,但如果“失败”,则会尝试描述和URL。


Python pip

Python pip安装

1
2
3
4
5
yum -y install python-setuptools
easy install pip
或者
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
python get-pip.py

Python pip更新包

1
2
3
4
5
pip install --upgrade setuptools -i https://pypi.tuna.tsinghua.edu.cn/simple

更新包是遇到类似问题,需要重新装
ERROR: Cannot uninstall 'pyparsing'. It is a distutils installed project and thus we cannot accurately determine which files belong to it which would lead to only a partial uninstall.
sudo pip install -I tox -i https://pypi.tuna.tsinghua.edu.cn/simple

Python pip国内源指定

1
2
3
4
5
6
pip install setuptools -i https://pypi.tuna.tsinghua.edu.cn/simple

或者在 ~/.pip/pip.conf 中添加
[global]
index-url=http://pypi.douban.com/simple/
#index-url=https://pypi.tuna.tsinghua.edu.cn/simple

Python开源项目编译安装

1
2
3
4
5
1、下载开源项目源代码
2、安装依赖
pip install -r requirements.txt
3、安装开源项目
python setup.py install

Python安装.whl文件

1
pip install some-package.whl

安装python3

1
2
3
4
5
yum -y install python36 python36-tools

安装python3的pip
curl "https://bootstrap.pypa.io/get-pip.py" -o "get-pip.py"
python3.6 get-pip.py

Centos 7升级GCC 7

1
2
3
4
5
6
7
8
1、确定g++默认使用的C++标准:
g++ -dM -E -x c++ /dev/null | grep -F __cplusplus

2、安装devtoolset-7
yum install -y centos-release-scl && yum install -y devtoolset-7-gcc-c++

3、使用GCC 7
scl enable devtoolset-7 bash

Centos安装扩展源

1
sudo yum install -y epel-release

Linux批量创建嵌套目录

1
mkdir -pv roles/vdbench/{tasks,templates,meta,defaults,vars,files,plugins,handler}

压缩

  • tar.gz格式
1
2
压缩(把ceph-14.2.1目录压缩为ceph-14.2.1.tar.gz):tar zcf ceph-14.2.1.tar.gz ceph-14.2.1
解压缩:tar zxf ceph-14.2.1.tar.gz

iperf检查网络带宽

1
2
3
4
5
6
sudo yum install -y epel-release
sudo yum install -y iperf
服务端
iperf -s -p 12345 -i 1 -M
客户端
iperf -c 192.168.1.10 -p 12345 -i 1 -t 600 -w 100M

ubuntu 1804添加repository

1
2
3
4
5
6
7
sudo apt update
sudo apt install software-properties-common
sudo add-apt-repository --yes --update ppa:ansible/ansible-2.8 ( 或 sudo apt-add-repository --yes --update ppa:ansible/ansible )
sudo apt install ansible

repository站点地址
https://launchpad.net/~ansible/+archive/ubuntu/ansible-2.8

查看硬盘类型

1
2
cat /sys/block/sda/queue/rotational
0为ssd,1为hdd

Ceph关于Scrub相关配置

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
OPTION(mon_warn_not_scrubbed, OPT_INT)                  //
OPTION(mon_warn_not_deep_scrubbed, OPT_INT) //
OPTION(mon_scrub_interval, OPT_INT) // 默认值:一天一次
OPTION(mon_scrub_timeout, OPT_INT) // 默认值:5分钟
OPTION(mon_scrub_max_keys, OPT_INT) // 每次scrub的最大key数
OPTION(mon_scrub_inject_crc_mismatch, OPT_DOUBLE) // 注入crc不匹配的概率[0.0,1.0]
OPTION(mon_scrub_inject_missing_keys, OPT_DOUBLE) // 注入缺失key的概率[0.0,1.0]
OPTION(mds_max_scrub_ops_in_progress, OPT_INT) // 并行执行的最大scrub操作数,允许同时scrub的数量

OPTION(osd_op_queue_mclock_scrub_res, OPT_DOUBLE) //
OPTION(osd_op_queue_mclock_scrub_wgt, OPT_DOUBLE) //
OPTION(osd_op_queue_mclock_scrub_lim, OPT_DOUBLE) //
OPTION(osd_scrub_invalid_stats, OPT_BOOL) //
OPTION(osd_max_scrubs, OPT_INT) // 单个OSD最大并发scrub数
OPTION(osd_scrub_during_recovery, OPT_BOOL) // 当OSD上的PG正在recovery时允许scrub
OPTION(osd_scrub_begin_hour, OPT_INT) // 限制scrubbing在每天几点开始
OPTION(osd_scrub_end_hour, OPT_INT) // 限制scrubbing在每天几点结束
OPTION(osd_scrub_begin_week_day, OPT_INT) // 限制scrubbing在一周中的哪天开始(星期几开始,0 or 7 = Sunday, 1 = Monday, etc.)
OPTION(osd_scrub_end_week_day, OPT_INT) // 限制scrubbing在一周中的哪天结束(星期几结束,0 or 7 = Sunday, 1 = Monday, etc.)
OPTION(osd_scrub_load_threshold, OPT_FLOAT) // 允许scrubbing低于系统负载(system load)除以CPU数量的值
OPTION(osd_scrub_min_interval, OPT_FLOAT) // 每隔一次PG Scrub频率不超过此间隔(如果负载低)
OPTION(osd_scrub_max_interval, OPT_FLOAT) // 每隔一次PG Scrub频率不低于此间隔(无论负载如何)
OPTION(osd_scrub_interval_randomize_ratio, OPT_FLOAT) // scrub间隔的比率随机变化,这可以通过随机改变scrub间隔来防止scrub 踩踏,以便它们很快在一周内均匀分布,在[min,min *(1 + randomize_ratio))的范围内随机化调度scrub
OPTION(osd_scrub_backoff_ratio, OPT_DOUBLE) // scrub调度尝试失败后的退避率,the probability to back off the scheduled scrub
OPTION(osd_scrub_chunk_min, OPT_INT) // 在单个chunk中scrub的最小对象数
OPTION(osd_scrub_chunk_max, OPT_INT) // 在单个chunk中scrub的最大对象数
OPTION(osd_scrub_sleep, OPT_FLOAT) // 持续scrub过程中注入延迟(在deep scrubbing操作之间sleep)
OPTION(osd_scrub_auto_repair, OPT_BOOL) // 自动修复scrub过程中检测到的受损对象(是否在deep-scrubbing时自动修复不一致)
OPTION(osd_scrub_auto_repair_num_errors, OPT_U32) // 自动修复的最大检测错误数(仅在错误数低于此阈值时自动修复)
OPTION(osd_deep_scrub_interval, OPT_FLOAT) // Deep scrub每个PG(即,验证数据校验和)(每周一次)
OPTION(osd_deep_scrub_randomize_ratio, OPT_FLOAT) // Deep scrub间隔比例随机变化,这可以通过随机改变scrub间隔来防止deep scrub 踩踏,当没有用户启动scrub时添加随机,scrub将随机变成这种速率的deep scrub(0.15 - > 15% deep scrub)
OPTION(osd_deep_scrub_stride, OPT_INT) // Deep scrub期间一次从对象读取的字节数
OPTION(osd_deep_scrub_keys, OPT_INT) // Deep scrub期间一次从对象读取的key数
OPTION(osd_deep_scrub_update_digest_min_age, OPT_INT) // 仅当对象的上次修改时间超过此时间时才更新整体对象digest(摘要)
OPTION(osd_deep_scrub_large_omap_object_key_threshold, OPT_U64) // 当我们遇到某个对象的omap key多于此阈值时,发出警告
OPTION(osd_deep_scrub_large_omap_object_value_sum_threshold, OPT_U64) // 当我们遇到某个对象的omap key bytes大小多于此阈值时,发出警告
OPTION(osd_debug_deep_scrub_sleep, OPT_FLOAT) // 在deep scrub IO期间注入sleep以使其更容易induce preemption(诱导抢占)
OPTION(osd_scrub_priority, OPT_U32) // 工作队列中的scrub操作的优先级
OPTION(osd_scrub_cost, OPT_U32) // 工作队列中的scrub操作的cost(设置默认cost等于50MB io)(cost翻译为成本,需要看代码到底指的是什么)
OPTION(osd_requested_scrub_priority, OPT_U32) // 设置请求的scrub优先级高于scrub优先级,使请求的scrub跳过预定scrub的队列

临时结论:

1、目前无法避免静默错误,Scrub也无法修复严重的静默错误

2、Scrub是Ceph检测静默错误的机制

3、目前发现静默错误也没有好的方法修复(只能通知客户尽可能挽回数据)

想法:

1、借助Scrub作为测试正确性与一致性的工具(需要解决覆盖写问题)

加群请加微信:yjtcok(注明:微信抢票)

本群只为抢票加速互助,不提供任何其他服务。不参与任何购物砍价,请大家知晓,谢谢。

截止2019-05-24 node_exporter监控项整理。

Collectors

每个操作系统对collector的支持各不相同。 下表列出了所有现有collector和支持的系统。通过--collector.<name>标志来启用collector。默认情况下启用的collector可以通过--no-collector.<name>标志来禁用。

默认情况下启用

Name Description OS
arp 来自/proc/net/arp的ARP统计信息 Linux
bcache 来自/sys/fs/bcache/的bcache统计信息 Linux
bonding Linux bonding interfaces已配置active slave的数量 Linux
boottime kern.boottime sysctl派生的系统启动时间 Darwin, Dragonfly, FreeBSD, NetBSD, OpenBSD, Solaris
conntrack 显示conntrack统计信息(如果没有/proc/sys/net/netfilter/,则不执行任何操作) Linux
cpu CPU统计信息 Darwin, Dragonfly, FreeBSD, Linux, Solaris
cpufreq CPU频率统计信息 Linux, Solaris
diskstats 磁盘I/O统计信息 Darwin, Linux, OpenBSD
edac 检错和纠错的统计数据 Linux
entropy 可用的熵(entropy) Linux
exec execution统计信息 Dragonfly, FreeBSD
filefd 来自/proc/sys/fs/file-nr的文件描述符统计信息 Linux
filesystem 文件系统统计信息,例如使用的磁盘空间 Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
hwmon 来自/sys/class/hwmon/的硬件监控和传感器数据 Linux
infiniband 特定于InfiniBand和Intel OmniPath配置的网络统计信息 Linux
ipvs 来自/proc/net/ip_vs的IPVS状态和来自/proc/net/ip_vs_stats的统计数据 Linux
loadavg 负载平均值 Darwin, Dragonfly, FreeBSD, Linux, NetBSD, OpenBSD, Solaris
mdadm 有关/proc/mdstat中设备的统计信息(如果没有/proc/mdstat,则不执行任何操作) Linux
meminfo 内存统计信息 Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netclass 来自/sys/class/net/的网络接口信息 Linux
netdev 网络接口统计信息,如字节传输 Darwin, Dragonfly, FreeBSD, Linux, OpenBSD
netstat 来自/proc/net/netstat的网络统计信息。 这与netstat -s的信息相同 Linux
nfs /proc/net/rpc/nfs公开NFS客户端统计信息。 这与nfsstat -c的信息相同 Linux
nfsd /proc/net/rpc/nfsd公开NFS内核服务器统计信息。 这与nfsstat -s的信息相同 Linux
pressure 来自/proc/pressure/的压力失速统计 Linux (kernel 4.20+ and/or CONFIG_PSI)
sockstat 公开来自/proc/net/sockstat的各种统计信息 Linux
stat 来自/proc/stat的各种统计信息。这包括启动时间,forks和interrupts。 Linux
textfile 从本地磁盘读取的统计信息。必须设置--collector.textfile.directory标志 any
time 当前的系统时间 any
timex selected adjtimex(2) system call统计信息 Linux
uname uname系统调用提供的系统信息 FreeBSD, Linux
vmstat 来自/proc/vmstat的统计信息 Linux
xfs XFS运行时统计信息 Linux (kernel 4.4+)
zfs ZFS性能统计数据 Linux, Solaris

默认情况下禁用

由于内核配置和安全设置,默认情况下,所有Linux系统上的perf收集器可能无法正常工作。 要允许访问,请设置以下sysctl参数:

1
sysctl -w kernel.perf_event_paranoid=X

2 仅允许用户空间度量(自Linux 4.6起默认)。

1 允许内核和用户度量(在Linux 4.6之前默认)。

0 允许访问特定CPU的数据,但不允许访问raw tracepoint samples。

-1 没有限制。

根据配置的值,将提供不同的度量标准,对于大多数情况,0将提供最完整的设置。 有关更多信息,请参阅man 2 perf_event_open

Name Description OS
buddyinfo /proc/buddyinfo报告的内存碎片统计信息 Linux
devstat 设备统计信息 Dragonfly, FreeBSD
drbd 分布式副本(Replicated)块设备统计信息(到版本8.4) Linux
interrupts 详细的中断统计 Linux, OpenBSD
ksmd 来自/sys/kernel/mm/ksm的内核和系统统计信息 Linux
logind 会话计数来自logind Linux
meminfo_numa 来自/proc/meminfo_numa的内存统计信息 Linux
mountstats 来自/proc/self/mountstats的文件系统统计信息。详细的NFS客户端统计信息。 Linux
ntp 本地NTP守护程序运行状况检查时间 any
processes 来自/proc的聚合进程统计信息 Linux
qdisc queuing discipline 统计 Linux
runit 来自runit的服务状态统计 any
supervisord 来自supervisord的服务状态统计 any
systemd 来自systemd的服务和系统状态统计 Linux
tcpstat 来自/proc/net/tcp/proc/net/tcp6的TCP连接状态信息。(警告:当前版本在高负载情况下存在潜在的性能问题。) Linux
wifi WiFi设备和station统计 Linux
perf 基于perf的指标(警告:指标取决于内核配置和设置) Linux

Textfile Collector

textfile collector类似于Pushgateway因为它允许从批处理job导出统计信息。它还可用于导出静态指标,例如计算机具有的role。Pushgateway应该用于服务级别指标。textfile模式用于绑定计算机的度量标准。

要使用它,请在Node exporter上设置--collector.textfile.directory标志。collector将使用文本格式解析该目录中与glob *.prom匹配的所有文件。 注意:不支持时间戳。

以原子方式推送cron job的完成时间:

1
2
echo my_batch_job_completion_time $(date +%s) > /path/to/directory/my_batch_job.prom.$$
mv /path/to/directory/my_batch_job.prom.$$ /path/to/directory/my_batch_job.prom

使用标签静态设置计算机的roles:

1
2
echo 'role{role="application_server"} 1' > /path/to/directory/role.prom.$$
mv /path/to/directory/role.prom.$$ /path/to/directory/role.prom

Filtering enabled collectors

默认情况下,node_exporter将从启用的collector中公开所有指标。 这是收集指标以避免在比较不同系列的指标时出错的建议方法。

高级使用node_exporter可以传递一个可选的收集器列表来过滤指标。 collect[]参数可以多次使用。 在Prometheus配置中,您可以在scrape config

1
2
3
4
params:
collect[]:
- foo
- bar

这对于让不同的Prometheus服务器从节点收集特定指标非常有用。

部分监控项见下表

Metrics Chinese explanation English explanation
node_arp_entries device的ARP表项 # HELP node_arp_entries ARP entries by device
node_boot_time_seconds 节点启动时间,unixtime # HELP node_boot_time_seconds Node boot time, in unixtime.
node_context_switches_total context switches(上下文切换)的总数 # HELP node_context_switches_total Total number of context switches.
node_cpu_guest_seconds_total 每种模式在guests(VM)上花费CPU的秒数 # HELP node_cpu_guest_seconds_total Seconds the cpus spent in guests (VMs) for each mode.
node_cpu_seconds_total 在每种模式下花费CPU的秒数 # HELP node_cpu_seconds_total Seconds the cpus spent in each mode.
node_disk_io_now 当前正在进行的I/O数量 # HELP node_disk_io_now The number of I/Os currently in progress.
node_disk_io_time_seconds_total 执行I/O所花费的总时间 # HELP node_disk_io_time_seconds_total Total seconds spent doing I/Os.
node_disk_io_time_weighted_seconds_total 进行I/O所花费的加权秒数 # HELP node_disk_io_time_weighted_seconds_total The weighted # of seconds spent doing I/Os.
node_disk_read_bytes_total 成功读取的总字节数 # HELP node_disk_read_bytes_total The total number of bytes read successfully.
node_disk_read_time_seconds_total 所有读取花费的总秒数 # HELP node_disk_read_time_seconds_total The total number of seconds spent by all reads.
node_disk_reads_completed_total 成功完成的读取总数 # HELP node_disk_reads_completed_total The total number of reads completed successfully.
node_disk_reads_merged_total 读合并的次数 # HELP node_disk_reads_merged_total The total number of reads merged.
node_disk_write_time_seconds_total 这是所有写入花费的总秒数 # HELP node_disk_write_time_seconds_total This is the total number of seconds spent by all writes.
node_disk_writes_completed_total 成功完成的写入总数 # HELP node_disk_writes_completed_total The total number of writes completed successfully.
node_disk_writes_merged_total 写合并的次数 # HELP node_disk_writes_merged_total The number of writes merged.
node_disk_written_bytes_total 成功写入的总字节数 # HELP node_disk_written_bytes_total The total number of bytes written successfully.
node_entropy_available_bits 可用entropy的Bits(比特) # HELP node_entropy_available_bits Bits of available entropy.
node_exporter_build_info 构建node_exporter的版本,修订版,分支和goversion标记,具有常量值“1”的metric # HELP node_exporter_build_info A metric with a constant ‘1’ value labeled by version, revision, branch, and goversion from which node_exporter was built.
node_filefd_allocated 文件描述符统计:已分配 # HELP node_filefd_allocated File descriptor statistics: allocated.
node_filefd_maximum 文件描述符统计:最大值 # HELP node_filefd_maximum File descriptor statistics: maximum.
node_filesystem_avail_bytes 非root用户可用的文件系统空间(以字节为单位) # HELP node_filesystem_avail_bytes Filesystem space available to non-root users in bytes.
node_filesystem_device_error 获取指定设备的统计信息时是否发生错误 # HELP node_filesystem_device_error Whether an error occurred while getting statistics for the given device.
node_filesystem_files 文件系统总的file nodes数(inode) # HELP node_filesystem_files Filesystem total file nodes.
node_filesystem_files_free 文件系统空闲file node数(inode) # HELP node_filesystem_files_free Filesystem total free file nodes.
node_filesystem_free_bytes 文件系统可用空间,以字节为单位 # HELP node_filesystem_free_bytes Filesystem free space in bytes.
node_filesystem_readonly 文件系统只读状态 # HELP node_filesystem_readonly Filesystem read-only status.
node_filesystem_size_bytes 文件系统大小(字节) # HELP node_filesystem_size_bytes Filesystem size in bytes.
node_forks_total forks总数 # HELP node_forks_total Total number of forks.
node_hwmon_chip_names 人类可读(可读性良好)的芯片名称的注释指标 # HELP node_hwmon_chip_names Annotation metric for human-readable chip names
node_hwmon_sensor_label 给定芯片和传感器的标签 # HELP node_hwmon_sensor_label Label for given chip and sensor
node_hwmon_temp_celsius 温度硬件监视器(输入) # HELP node_hwmon_temp_celsius Hardware monitor for temperature (input)
node_hwmon_temp_crit_alarm_celsius 温度硬件监控器(crit_alarm) # HELP node_hwmon_temp_crit_alarm_celsius Hardware monitor for temperature (crit_alarm)
node_hwmon_temp_crit_celsius 温度硬件监控器(暴击) # HELP node_hwmon_temp_crit_celsius Hardware monitor for temperature (crit)
node_hwmon_temp_max_celsius 温度硬件监控器(最大) # HELP node_hwmon_temp_max_celsius Hardware monitor for temperature (max)
node_intr_total 服务的中断总数 # HELP node_intr_total Total number of interrupts serviced.
node_load1 1m负载平均值 # HELP node_load1 1m load average.
node_load15 15m负载平均值 # HELP node_load15 15m load average.
node_load5 5m负载平均值 # HELP node_load5 5m load average.
node_memory_Active_anon_bytes 内存信息字段Active_anon_bytes # HELP node_memory_Active_anon_bytes Memory information field Active_anon_bytes.
node_memory_Active_bytes 内存信息字段Active_bytes # HELP node_memory_Active_bytes Memory information field Active_bytes.
node_memory_Active_file_bytes 内存信息字段Active_file_bytes # HELP node_memory_Active_file_bytes Memory information field Active_file_bytes.
node_memory_AnonHugePages_bytes 内存信息字段AnonHugePages_bytes # HELP node_memory_AnonHugePages_bytes Memory information field AnonHugePages_bytes.
node_memory_AnonPages_bytes 内存信息字段AnonPages_bytes # HELP node_memory_AnonPages_bytes Memory information field AnonPages_bytes.
node_memory_Bounce_bytes 内存信息字段Bounce_bytes # HELP node_memory_Bounce_bytes Memory information field Bounce_bytes.
node_memory_Buffers_bytes 内存信息字段Buffers_bytes # HELP node_memory_Buffers_bytes Memory information field Buffers_bytes.
node_memory_Cached_bytes 内存信息字段Cached_bytes # HELP node_memory_Cached_bytes Memory information field Cached_bytes.
node_memory_CmaFree_bytes 内存信息字段CmaFree_bytes # HELP node_memory_CmaFree_bytes Memory information field CmaFree_bytes.
node_memory_CmaTotal_bytes 内存信息字段CmaTotal_bytes # HELP node_memory_CmaTotal_bytes Memory information field CmaTotal_bytes.
node_memory_CommitLimit_bytes 内存信息字段CommitLimit_bytes # HELP node_memory_CommitLimit_bytes Memory information field CommitLimit_bytes.
node_memory_Committed_AS_bytes 内存信息字段Committed_AS_bytes # HELP node_memory_Committed_AS_bytes Memory information field Committed_AS_bytes.
node_memory_DirectMap1G_bytes 内存信息字段DirectMap1G_bytes # HELP node_memory_DirectMap1G_bytes Memory information field DirectMap1G_bytes.
node_memory_DirectMap2M_bytes 内存信息字段DirectMap2M_bytes # HELP node_memory_DirectMap2M_bytes Memory information field DirectMap2M_bytes.
node_memory_DirectMap4k_bytes 内存信息字段DirectMap4k_bytes # HELP node_memory_DirectMap4k_bytes Memory information field DirectMap4k_bytes.
node_memory_Dirty_bytes 内存信息字段Dirty_bytes # HELP node_memory_Dirty_bytes Memory information field Dirty_bytes.
node_memory_HardwareCorrupted_bytes 内存信息字段HardwareCorrupted_bytes # HELP node_memory_HardwareCorrupted_bytes Memory information field HardwareCorrupted_bytes.
node_memory_HugePages_Free 内存信息字段HugePages_Free # HELP node_memory_HugePages_Free Memory information field HugePages_Free.
node_memory_HugePages_Rsvd 内存信息字段HugePages_Rsvd # HELP node_memory_HugePages_Rsvd Memory information field HugePages_Rsvd.
node_memory_HugePages_Surp 内存信息字段HugePages_Surp # HELP node_memory_HugePages_Surp Memory information field HugePages_Surp.
node_memory_HugePages_Total 内存信息字段HugePages_Total # HELP node_memory_HugePages_Total Memory information field HugePages_Total.
node_memory_Hugepagesize_bytes 内存信息字段Hugepagesize_bytes # HELP node_memory_Hugepagesize_bytes Memory information field Hugepagesize_bytes.
node_memory_Inactive_anon_bytes 内存信息字段Inactive_anon_bytes # HELP node_memory_Inactive_anon_bytes Memory information field Inactive_anon_bytes.
node_memory_Inactive_bytes 内存信息字段Inactive_bytes # HELP node_memory_Inactive_bytes Memory information field Inactive_bytes.
node_memory_Inactive_file_bytes 内存信息字段Inactive_file_bytes # HELP node_memory_Inactive_file_bytes Memory information field Inactive_file_bytes.
node_memory_KernelStack_bytes 内存信息字段KernelStack_bytes # HELP node_memory_KernelStack_bytes Memory information field KernelStack_bytes.
node_memory_Mapped_bytes 内存信息字段Mapped_bytes # HELP node_memory_Mapped_bytes Memory information field Mapped_bytes.
node_memory_MemAvailable_bytes 内存信息字段MemAvailable_bytes # HELP node_memory_MemAvailable_bytes Memory information field MemAvailable_bytes.
node_memory_MemFree_bytes 内存信息字段MemFree_bytes # HELP node_memory_MemFree_bytes Memory information field MemFree_bytes.
node_memory_MemTotal_bytes 内存信息字段MemTotal_bytes # HELP node_memory_MemTotal_bytes Memory information field MemTotal_bytes.
node_memory_Mlocked_bytes 内存信息字段Mlocked_bytes # HELP node_memory_Mlocked_bytes Memory information field Mlocked_bytes.
node_memory_NFS_Unstable_bytes 内存信息字段NFS_Unstable_bytes # HELP node_memory_NFS_Unstable_bytes Memory information field NFS_Unstable_bytes.
node_memory_PageTables_bytes 内存信息字段PageTables_bytes # HELP node_memory_PageTables_bytes Memory information field PageTables_bytes.
node_memory_SReclaimable_bytes 内存信息字段SReclaimable_bytes # HELP node_memory_SReclaimable_bytes Memory information field SReclaimable_bytes.
node_memory_SUnreclaim_bytes 内存信息字段SUnreclaim_bytes # HELP node_memory_SUnreclaim_bytes Memory information field SUnreclaim_bytes.
node_memory_Shmem_bytes 内存信息字段Shmem_bytes # HELP node_memory_Shmem_bytes Memory information field Shmem_bytes.
node_memory_Slab_bytes 内存信息字段Slab_bytes # HELP node_memory_Slab_bytes Memory information field Slab_bytes.
node_memory_SwapCached_bytes 内存信息字段SwapCached_bytes # HELP node_memory_SwapCached_bytes Memory information field SwapCached_bytes.
node_memory_SwapFree_bytes 内存信息字段SwapFree_bytes # HELP node_memory_SwapFree_bytes Memory information field SwapFree_bytes.
node_memory_SwapTotal_bytes 内存信息字段SwapTotal_bytes # HELP node_memory_SwapTotal_bytes Memory information field SwapTotal_bytes.
node_memory_Unevictable_bytes 内存信息字段Unevictable_bytes # HELP node_memory_Unevictable_bytes Memory information field Unevictable_bytes.
node_memory_VmallocChunk_bytes 内存信息字段VmallocChunk_bytes # HELP node_memory_VmallocChunk_bytes Memory information field VmallocChunk_bytes.
node_memory_VmallocTotal_bytes 内存信息字段VmallocTotal_bytes # HELP node_memory_VmallocTotal_bytes Memory information field VmallocTotal_bytes.
node_memory_VmallocUsed_bytes 内存信息字段VmallocUsed_bytes # HELP node_memory_VmallocUsed_bytes Memory information field VmallocUsed_bytes.
node_memory_WritebackTmp_bytes 内存信息字段WritebackTmp_bytes # HELP node_memory_WritebackTmp_bytes Memory information field WritebackTmp_bytes.
node_memory_Writeback_bytes 内存信息字段Writeback_bytes # HELP node_memory_Writeback_bytes Memory information field Writeback_bytes.
node_netstat_Icmp6_InErrors 统计Icmp6InErrors # HELP node_netstat_Icmp6_InErrors Statistic Icmp6InErrors.
node_netstat_Icmp6_InMsgs 统计Icmp6InMsgs # HELP node_netstat_Icmp6_InMsgs Statistic Icmp6InMsgs.
node_netstat_Icmp6_OutMsgs 统计Icmp6OutMsgs # HELP node_netstat_Icmp6_OutMsgs Statistic Icmp6OutMsgs.
node_netstat_Icmp_InErrors 统计IcmpInErrors # HELP node_netstat_Icmp_InErrors Statistic IcmpInErrors.
node_netstat_Icmp_InMsgs 统计IcmpInMsgs # HELP node_netstat_Icmp_InMsgs Statistic IcmpInMsgs.
node_netstat_Icmp_OutMsgs 统计IcmpOutMsgs # HELP node_netstat_Icmp_OutMsgs Statistic IcmpOutMsgs.
node_netstat_Ip6_InOctets 统计Ip6InOctets # HELP node_netstat_Ip6_InOctets Statistic Ip6InOctets.
node_netstat_Ip6_OutOctets 统计Ip6OutOctets # HELP node_netstat_Ip6_OutOctets Statistic Ip6OutOctets.
node_netstat_IpExt_InOctets 统计IpExtInOctets # HELP node_netstat_IpExt_InOctets Statistic IpExtInOctets.
node_netstat_IpExt_OutOctets 统计IpExtOutOctets # HELP node_netstat_IpExt_OutOctets Statistic IpExtOutOctets.
node_netstat_Ip_Forwarding 统计IpForwarding # HELP node_netstat_Ip_Forwarding Statistic IpForwarding.
node_netstat_TcpExt_ListenDrops 统计TcpExtListenDrops # HELP node_netstat_TcpExt_ListenDrops Statistic TcpExtListenDrops.
node_netstat_TcpExt_ListenOverflows 统计TcpExtListenOverflows # HELP node_netstat_TcpExt_ListenOverflows Statistic TcpExtListenOverflows.
node_netstat_TcpExt_SyncookiesFailed 统计TcpExtSyncookiesFailed # HELP node_netstat_TcpExt_SyncookiesFailed Statistic TcpExtSyncookiesFailed.
node_netstat_TcpExt_SyncookiesRecv 统计TcpExtSyncookiesRecv # HELP node_netstat_TcpExt_SyncookiesRecv Statistic TcpExtSyncookiesRecv.
node_netstat_TcpExt_SyncookiesSent 统计TcpExtSyncookiesSent # HELP node_netstat_TcpExt_SyncookiesSent Statistic TcpExtSyncookiesSent.
node_netstat_TcpExt_TCPSynRetrans 统计TcpExtTCPSynRetrans # HELP node_netstat_TcpExt_TCPSynRetrans Statistic TcpExtTCPSynRetrans.
node_netstat_Tcp_ActiveOpens 统计TcpActiveOpens # HELP node_netstat_Tcp_ActiveOpens Statistic TcpActiveOpens.
node_netstat_Tcp_CurrEstab 统计TcpCurrEstab # HELP node_netstat_Tcp_CurrEstab Statistic TcpCurrEstab.
node_netstat_Tcp_InErrs 统计TcpInErrs # HELP node_netstat_Tcp_InErrs Statistic TcpInErrs.
node_netstat_Tcp_InSegs 统计TcpInSegs # HELP node_netstat_Tcp_InSegs Statistic TcpInSegs.
node_netstat_Tcp_OutSegs 统计TcpOutSegs # HELP node_netstat_Tcp_OutSegs Statistic TcpOutSegs.
node_netstat_Tcp_PassiveOpens 统计TcpPassiveOpens # HELP node_netstat_Tcp_PassiveOpens Statistic TcpPassiveOpens.
node_netstat_Tcp_RetransSegs 统计TcpRetransSegs # HELP node_netstat_Tcp_RetransSegs Statistic TcpRetransSegs.
node_netstat_Udp6_InDatagrams 统计Udp6InDatagrams # HELP node_netstat_Udp6_InDatagrams Statistic Udp6InDatagrams.
node_netstat_Udp6_InErrors 统计Udp6InErrors # HELP node_netstat_Udp6_InErrors Statistic Udp6InErrors.
node_netstat_Udp6_NoPorts 统计Udp6NoPorts # HELP node_netstat_Udp6_NoPorts Statistic Udp6NoPorts.
node_netstat_Udp6_OutDatagrams 统计Udp6OutDatagrams # HELP node_netstat_Udp6_OutDatagrams Statistic Udp6OutDatagrams.
node_netstat_UdpLite6_InErrors 统计UdpLite6InErrors # HELP node_netstat_UdpLite6_InErrors Statistic UdpLite6InErrors.
node_netstat_UdpLite_InErrors 统计UdpLiteInErrors # HELP node_netstat_UdpLite_InErrors Statistic UdpLiteInErrors.
node_netstat_Udp_InDatagrams 统计UdpInDatagrams # HELP node_netstat_Udp_InDatagrams Statistic UdpInDatagrams.
node_netstat_Udp_InErrors 统计UdpInErrors # HELP node_netstat_Udp_InErrors Statistic UdpInErrors.
node_netstat_Udp_NoPorts 统计UdpNoPorts # HELP node_netstat_Udp_NoPorts Statistic UdpNoPorts.
node_netstat_Udp_OutDatagrams 统计UdpOutDatagrams # HELP node_netstat_Udp_OutDatagrams Statistic UdpOutDatagrams.
node_network_address_assign_type /sys/class/net/ address_assign_type值 # HELP node_network_address_assign_type address_assign_type value of /sys/class/net/.
node_network_carrier /sys/class/net/ carrier值 # HELP node_network_carrier carrier value of /sys/class/net/.
node_network_carrier_changes_total /sys/class/net/ carrier_changes_total值 # HELP node_network_carrier_changes_total carrier_changes_total value of /sys/class/net/.
node_network_device_id /sys/class/net/ device_id值 # HELP node_network_device_id device_id value of /sys/class/net/.
node_network_dormant /sys/class/net/ dormant(休眠)值 # HELP node_network_dormant dormant value of /sys/class/net/.
node_network_flags /sys/class/net/ flags值 # HELP node_network_flags flags value of /sys/class/net/.
node_network_iface_id /sys/class/net/ iface_id值 # HELP node_network_iface_id iface_id value of /sys/class/net/.
node_network_iface_link /sys/class/net/ iface_link值 # HELP node_network_iface_link iface_link value of /sys/class/net/.
node_network_iface_link_mode /sys/class/net/ iface_link_mode值 # HELP node_network_iface_link_mode iface_link_mode value of /sys/class/net/.
node_network_info 来自/sys/class/net/的非数字数据,值始终为1 # HELP node_network_info Non-numeric data from /sys/class/net/, value is always 1.
node_network_mtu_bytes /sys/class/net/ mtu_bytes值 # HELP node_network_mtu_bytes mtu_bytes value of /sys/class/net/.
node_network_net_dev_group /sys/class/net/ net_dev_group值 # HELP node_network_net_dev_group net_dev_group value of /sys/class/net/.
node_network_protocol_type /sys/class/net/ protocol_type值 # HELP node_network_protocol_type protocol_type value of /sys/class/net/.
node_network_receive_bytes_total 网络设备统计信息receive_bytes # HELP node_network_receive_bytes_total Network device statistic receive_bytes.
node_network_receive_compressed_total 网络设备统计信息receive_compressed # HELP node_network_receive_compressed_total Network device statistic receive_compressed.
node_network_receive_drop_total 网络设备统计信息receive_drop # HELP node_network_receive_drop_total Network device statistic receive_drop.
node_network_receive_errs_total 网络设备统计信息receive_errs # HELP node_network_receive_errs_total Network device statistic receive_errs.
node_network_receive_fifo_total 网络设备统计信息receive_fifo # HELP node_network_receive_fifo_total Network device statistic receive_fifo.
node_network_receive_frame_total 网络设备统计信息receive_frame # HELP node_network_receive_frame_total Network device statistic receive_frame.
node_network_receive_multicast_total 网络设备统计信息receive_multicast # HELP node_network_receive_multicast_total Network device statistic receive_multicast.
node_network_receive_packets_total 网络设备统计信息receive_packets # HELP node_network_receive_packets_total Network device statistic receive_packets.
node_network_speed_bytes /sys/class/net/ speed_bytes值 # HELP node_network_speed_bytes speed_bytes value of /sys/class/net/.
node_network_transmit_bytes_total 网络设备统计信息transmit_bytes # HELP node_network_transmit_bytes_total Network device statistic transmit_bytes.
node_network_transmit_carrier_total 网络设备统计信息transmit_carrier # HELP node_network_transmit_carrier_total Network device statistic transmit_carrier.
node_network_transmit_colls_total 网络设备统计信息transmit_colls # HELP node_network_transmit_colls_total Network device statistic transmit_colls.
node_network_transmit_compressed_total 网络设备统计信息transmit_compressed # HELP node_network_transmit_compressed_total Network device statistic transmit_compressed.
node_network_transmit_drop_total 网络设备统计信息transmit_drop # HELP node_network_transmit_drop_total Network device statistic transmit_drop.
node_network_transmit_errs_total 网络设备统计信息transmit_errs # HELP node_network_transmit_errs_total Network device statistic transmit_errs.
node_network_transmit_fifo_total 网络设备统计信息transmit_fifo # HELP node_network_transmit_fifo_total Network device statistic transmit_fifo.
node_network_transmit_packets_total 网络设备统计信息transmit_packets # HELP node_network_transmit_packets_total Network device statistic transmit_packets.
node_network_transmit_queue_length /sys/class/net/ transmit_queue_length值 # HELP node_network_transmit_queue_length transmit_queue_length value of /sys/class/net/.
node_network_up 如果operstate为’up’,则值为1,否则为0 # HELP node_network_up Value is 1 if operstate is ‘up’, 0 otherwise.
node_procs_blocked 阻塞等待I/O完成的进程数 # HELP node_procs_blocked Number of processes blocked waiting for I/O to complete.
node_procs_running 处于可运行状态的进程数 # HELP node_procs_running Number of processes in runnable state.
node_scrape_collector_duration_seconds node_exporter:collector scrape持续时间 # HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
node_scrape_collector_success node_exporter:collector 是否成功 # HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
node_sockstat_FRAG_inuse state inuse中的FRAG sockets数量 # HELP node_sockstat_FRAG_inuse Number of FRAG sockets in state inuse.
node_sockstat_FRAG_memory state memory(状态存储器)中的FRAG sockets数量 # HELP node_sockstat_FRAG_memory Number of FRAG sockets in state memory.
node_sockstat_RAW_inuse state inuse中的RAW sockets数 # HELP node_sockstat_RAW_inuse Number of RAW sockets in state inuse.
node_sockstat_TCP_alloc state alloc中的TCP sockets数 # HELP node_sockstat_TCP_alloc Number of TCP sockets in state alloc.
node_sockstat_TCP_inuse state inuse中的TCP sockets数 # HELP node_sockstat_TCP_inuse Number of TCP sockets in state inuse.
node_sockstat_TCP_mem state mem中的TCP sockets数 # HELP node_sockstat_TCP_mem Number of TCP sockets in state mem.
node_sockstat_TCP_mem_bytes state mem_bytes中的TCP sockets数 # HELP node_sockstat_TCP_mem_bytes Number of TCP sockets in state mem_bytes.
node_sockstat_TCP_orphan state orphan中的TCP sockets数 # HELP node_sockstat_TCP_orphan Number of TCP sockets in state orphan.
node_sockstat_TCP_tw state tw中的TCP sockets数 # HELP node_sockstat_TCP_tw Number of TCP sockets in state tw.
node_sockstat_UDPLITE_inuse state inuse中的UDPLITE UDP sockets数 # HELP node_sockstat_UDPLITE_inuse Number of UDPLITE sockets in state inuse.
node_sockstat_UDP_inuse state inuse中的UDP sockets数 # HELP node_sockstat_UDP_inuse Number of UDP sockets in state inuse.
node_sockstat_UDP_mem state mem中的UDP sockets数 # HELP node_sockstat_UDP_mem Number of UDP sockets in state mem.
node_sockstat_UDP_mem_bytes state mem_bytes中的UDP sockets数 # HELP node_sockstat_UDP_mem_bytes Number of UDP sockets in state mem_bytes.
node_sockstat_sockets_used state used中的sockets sockets数 # HELP node_sockstat_sockets_used Number of sockets sockets in state used.
node_textfile_scrape_error 如果打开或读取文件时出错时为1,否则为0 # HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
node_time_seconds 以纪元(1970年)开始的系统时间 # HELP node_time_seconds System time in seconds since epoch (1970).
node_timex_estimated_error_seconds 估计误差(秒) # HELP node_timex_estimated_error_seconds Estimated error in seconds.
node_timex_frequency_adjustment_ratio 本地时钟频率调整 # HELP node_timex_frequency_adjustment_ratio Local clock frequency adjustment.
node_timex_loop_time_constant 锁相回路时间常数 # HELP node_timex_loop_time_constant Phase-locked loop time constant.
node_timex_maxerror_seconds 最大误差(秒) # HELP node_timex_maxerror_seconds Maximum error in seconds.
node_timex_offset_seconds 本地系统和参考时钟之间的时间偏移 # HELP node_timex_offset_seconds Time offset in between local system and reference clock.
node_timex_pps_calibration_total 每秒脉冲校准间隔计数 # HELP node_timex_pps_calibration_total Pulse per second count of calibration intervals.
node_timex_pps_error_total 每秒脉冲校准错误计数 # HELP node_timex_pps_error_total Pulse per second count of calibration errors.
node_timex_pps_frequency_hertz 脉冲每秒频率 # HELP node_timex_pps_frequency_hertz Pulse per second frequency.
node_timex_pps_jitter_seconds 脉冲每秒抖动 # HELP node_timex_pps_jitter_seconds Pulse per second jitter.
node_timex_pps_jitter_total 每秒脉冲数超出抖动限制的事件数 # HELP node_timex_pps_jitter_total Pulse per second count of jitter limit exceeded events.
node_timex_pps_shift_seconds 每秒脉冲数超出稳定极限的事件数 # HELP node_timex_pps_shift_seconds Pulse per second interval duration.
node_timex_pps_stability_exceeded_total 每秒脉冲数超出稳定极限事件 # HELP node_timex_pps_stability_exceeded_total Pulse per second count of stability limit exceeded events.
node_timex_pps_stability_hertz 脉冲每秒稳定性,最近频率变化的平均值 # HELP node_timex_pps_stability_hertz Pulse per second stability, average of recent frequency changes.
node_timex_status status array bits的值 # HELP node_timex_status Value of the status array bits.
node_timex_sync_status 时钟与可靠服务器同步(1 = yes, 0 = no) # HELP node_timex_sync_status Is clock synchronized to a reliable server (1 = yes, 0 = no).
node_timex_tai_offset_seconds 国际原子时间(TAI)偏移量 # HELP node_timex_tai_offset_seconds International Atomic Time (TAI) offset.
node_timex_tick_seconds 时钟周期之间的秒数 # HELP node_timex_tick_seconds Seconds between clock ticks.
node_uname_info 由uname系统调用提供的标记系统信息 # HELP node_uname_info Labeled system information as provided by the uname system call.
node_vmstat_pgfault /proc/vmstat信息字段pgfault # HELP node_vmstat_pgfault /proc/vmstat information field pgfault.
node_vmstat_pgmajfault /proc/vmstat信息字段pgmajfault # HELP node_vmstat_pgmajfault /proc/vmstat information field pgmajfault.
node_vmstat_pgpgin /proc/vmstat信息字段pgpgin # HELP node_vmstat_pgpgin /proc/vmstat information field pgpgin.
node_vmstat_pgpgout /proc/vmstat信息字段pgpgout # HELP node_vmstat_pgpgout /proc/vmstat information field pgpgout.
node_vmstat_pswpin /proc/vmstat信息字段pswpin # HELP node_vmstat_pswpin /proc/vmstat information field pswpin.
node_vmstat_pswpout /proc/vmstat信息字段pswpout # HELP node_vmstat_pswpout /proc/vmstat information field pswpout.
node_xfs_allocation_btree_compares_total 文件系统的分配B-tree比较数 # HELP node_xfs_allocation_btree_compares_total Number of allocation B-tree compares for a filesystem.
node_xfs_allocation_btree_lookups_total 文件系统的分配B-tree查找数 # HELP node_xfs_allocation_btree_lookups_total Number of allocation B-tree lookups for a filesystem.
node_xfs_allocation_btree_records_deleted_total 为文件系统删除的分配B-tree记录数 # HELP node_xfs_allocation_btree_records_deleted_total Number of allocation B-tree records deleted for a filesystem.
node_xfs_allocation_btree_records_inserted_total 为文件系统插入的分配B-tree记录数 # HELP node_xfs_allocation_btree_records_inserted_total Number of allocation B-tree records inserted for a filesystem.
node_xfs_block_map_btree_compares_total 对于文件系统block映射B-tree比较的数量 # HELP node_xfs_block_map_btree_compares_total Number of block map B-tree compares for a filesystem.
node_xfs_block_map_btree_lookups_total 对于文件系统block映射B-tree查找的数量 # HELP node_xfs_block_map_btree_lookups_total Number of block map B-tree lookups for a filesystem.
node_xfs_block_map_btree_records_deleted_total 文件系统删除block映射B-tree记录的数量 # HELP node_xfs_block_map_btree_records_deleted_total Number of block map B-tree records deleted for a filesystem.
node_xfs_block_map_btree_records_inserted_total 文件系统插入block映射B-tree记录的数量 # HELP node_xfs_block_map_btree_records_inserted_total Number of block map B-tree records inserted for a filesystem.
node_xfs_block_mapping_extent_list_compares_total 文件系统的extent列表比较数 # HELP node_xfs_block_mapping_extent_list_compares_total Number of extent list compares for a filesystem.
node_xfs_block_mapping_extent_list_deletions_total 文件系统的extent列表删除数 # HELP node_xfs_block_mapping_extent_list_deletions_total Number of extent list deletions for a filesystem.
node_xfs_block_mapping_extent_list_insertions_total 文件系统的extent列表插入数 # HELP node_xfs_block_mapping_extent_list_insertions_total Number of extent list insertions for a filesystem.
node_xfs_block_mapping_extent_list_lookups_total 文件系统的extent列表查找次数 # HELP node_xfs_block_mapping_extent_list_lookups_total Number of extent list lookups for a filesystem.
node_xfs_block_mapping_reads_total 文件系统读取操作的block映射数 # HELP node_xfs_block_mapping_reads_total Number of block map for read operations for a filesystem.
node_xfs_block_mapping_unmaps_total 文件系统的block取消映射(删除)的数量 # HELP node_xfs_block_mapping_unmaps_total Number of block unmaps (deletes) for a filesystem.
node_xfs_block_mapping_writes_total 文件系统写操作的block映射数 # HELP node_xfs_block_mapping_writes_total Number of block map for write operations for a filesystem.
node_xfs_extent_allocation_blocks_allocated_total 文件系统分配的block数 # HELP node_xfs_extent_allocation_blocks_allocated_total Number of blocks allocated for a filesystem.
node_xfs_extent_allocation_blocks_freed_total 文件系统释放的block数 # HELP node_xfs_extent_allocation_blocks_freed_total Number of blocks freed for a filesystem.
node_xfs_extent_allocation_extents_allocated_total 为文件系统分配的extent数 在电脑文件系统中,一个 Extent (在中国大陆某些文献中翻译为“区块”[1]),是指一段连续的存储空间。一般来说,一个文件的物理大小一定是一个 extent 容量的整数倍。当一个进程创建一个文件的时候,文件系统管理程序会将整个 extent 分配给这个文件。当再次向该文件写入数据时 (有可能是在其他写入操作之后),数据会从上次写入的数据末尾处追加数据。这样可以减少甚至消除文件碎片。 参考:http://www.wikiwand.com/zh-sg/Extent_(%E6%AA%94%E6%A1%88%E7%B3%BB%E7%B5%B1) # HELP node_xfs_extent_allocation_extents_allocated_total Number of extents allocated for a filesystem.
node_xfs_extent_allocation_extents_freed_total 文件系统释放的extent数 # HELP node_xfs_extent_allocation_extents_freed_total Number of extents freed for a filesystem.

截止2019-05-24 ceph_exporter监控项整理

Metrics Chinese explanation English explanation
ceph_active_pgs 处于active状态的PG数 # HELP ceph_active_pgs No. of active PGs in the cluster
ceph_backfill_wait_pgs 处于backfill_wait状态的PG数 # HELP ceph_backfill_wait_pgs No. of PGs in the cluster with backfill_wait state
ceph_backfilling_pgs 处于backfilling状态的PG数 # HELP ceph_backfilling_pgs No. of backfilling PGs in the cluster
ceph_cache_evict_io_bytes 每秒从cache pool(缓存池)中evicted(逐出)的字节数 # HELP ceph_cache_evict_io_bytes Rate of bytes being evicted from the cache pool per second
ceph_cache_flush_io_bytes 每秒从cache pool(缓存池)中flushed(刷新)的字节数 # HELP ceph_cache_flush_io_bytes Rate of bytes being flushed from the cache pool per second
ceph_cache_promote_io_ops 每秒cache promote操作的总数 # HELP ceph_cache_promote_io_ops Total cache promote operations measured per second
ceph_client_io_ops 客户端每秒操作总数 # HELP ceph_client_io_ops Total client ops on the cluster measured per second
ceph_client_io_read_bytes 客户端每秒read字节数 # HELP ceph_client_io_read_bytes Rate of bytes being read by all clients per second
ceph_client_io_read_ops 客户端每秒read总I/O操作数 # HELP ceph_client_io_read_ops Total client read I/O ops on the cluster measured per second
ceph_client_io_write_bytes 客户端每秒write字节数 # HELP ceph_client_io_write_bytes Rate of bytes being written by all clients per second
ceph_client_io_write_ops 客户端每秒write总I/O操作数 # HELP ceph_client_io_write_ops Total client write I/O ops on the cluster measured per second
ceph_cluster_available_bytes 群集中的可用空间 # HELP ceph_cluster_available_bytes Available space within the cluster
ceph_cluster_capacity_bytes 群集的总容量 # HELP ceph_cluster_capacity_bytes Total capacity of the cluster
ceph_cluster_objects 集群中的rados object数 # HELP ceph_cluster_objects No. of rados objects within the cluster
ceph_cluster_used_bytes 集群已使用的容量 # HELP ceph_cluster_used_bytes Capacity of the cluster currently in use
ceph_deep_scrubbing_pgs deep scrubbing状态的PG数量 # HELP ceph_deep_scrubbing_pgs No. of deep scrubbing PGs in the cluster
ceph_degraded_objects 所有PG中degraded objects的数量,包括副本 # HELP ceph_degraded_objects No. of degraded objects across all PGs, includes replicas
ceph_degraded_pgs 处于degraded状态的PG数 # HELP ceph_degraded_pgs No. of PGs in a degraded state
ceph_down_pgs 处于down状态的PG数 # HELP ceph_down_pgs No. of PGs in the cluster in down state
ceph_forced_backfill_pgs 处于forced_backfill状态的PG数 # HELP ceph_forced_backfill_pgs No. of PGs in the cluster with forced_backfill state
ceph_forced_recovery_pgs 处于forced_recovery状态的PG数 # HELP ceph_forced_recovery_pgs No. of PGs in the cluster with forced_recovery state
ceph_health_status 群集的health状态,只能在3种状态之间变化(err:2, warn:1, ok:0) # HELP ceph_health_status Health status of Cluster, can vary only between 3 states (err:2, warn:1, ok:0)
ceph_misplaced_objects 所有PG中misplaced object的数量,包括副本 # HELP ceph_misplaced_objects No. of misplaced objects across all PGs, includes replicas
ceph_monitor_quorum_count monitor quorum的总数 # HELP ceph_monitor_quorum_count The total size of the monitor quorum
ceph_osd_avail_bytes OSD可用存储字节数 # HELP ceph_osd_avail_bytes OSD Available Storage in Bytes
ceph_osd_average_utilization OSD平均利用率 # HELP ceph_osd_average_utilization OSD Average Utilization
ceph_osd_bytes OSD总字节数 # HELP ceph_osd_bytes OSD Total Bytes
ceph_osd_crush_weight OSD Crush Weight # HELP ceph_osd_crush_weight OSD Crush Weight
ceph_osd_depth OSD 深度 # HELP ceph_osd_depth OSD Depth
ceph_osd_in OSD In状态 # HELP ceph_osd_in OSD In Status
ceph_osd_perf_apply_latency_seconds OSD Perf Apply延迟 # HELP ceph_osd_perf_apply_latency_seconds OSD Perf Apply Latency
ceph_osd_perf_commit_latency_seconds OSD Perf Commit延迟 # HELP ceph_osd_perf_commit_latency_seconds OSD Perf Commit Latency
ceph_osd_pgs OSD Placement Group计数 # HELP ceph_osd_pgs OSD Placement Group Count
ceph_osd_reweight OSD Reweight # HELP ceph_osd_reweight OSD Reweight
ceph_osd_total_avail_bytes OSD可用存储总字节数 # HELP ceph_osd_total_avail_bytes OSD Total Available Storage Bytes
ceph_osd_total_bytes OSD总存储字节数 # HELP ceph_osd_total_bytes OSD Total Storage Bytes
ceph_osd_total_used_bytes OSD已使用总的存储字节数 # HELP ceph_osd_total_used_bytes OSD Total Used Storage Bytes
ceph_osd_up OSD Up状态 # HELP ceph_osd_up OSD Up Status
ceph_osd_used_bytes OSD已使用的存储空间字节 # HELP ceph_osd_used_bytes OSD Used Storage in Bytes
ceph_osd_utilization OSD利用率 # HELP ceph_osd_utilization OSD Utilization
ceph_osd_variance OSD方差 # HELP ceph_osd_variance OSD Variance
ceph_osdmap_flag_full 群集标记为已满,无法提供写入服务 # HELP ceph_osdmap_flag_full The cluster is flagged as full and cannot service writes
ceph_osdmap_flag_nobackfill OSD不会被backfill # HELP ceph_osdmap_flag_nobackfill OSDs will not be backfilled
ceph_osdmap_flag_nodeep_scrub 禁用Deep scrubbing # HELP ceph_osdmap_flag_nodeep_scrub Deep scrubbing is disabled
ceph_osdmap_flag_nodown 忽略OSD失败报告,OSD不会被标记为down # HELP ceph_osdmap_flag_nodown OSD failure reports are ignored, OSDs will not be marked as down
ceph_osdmap_flag_noin 不会自动标记out的OSD # HELP ceph_osdmap_flag_noin OSDs that are out will not be automatically marked in
ceph_osdmap_flag_noout 在配置的间隔后,OSD不会自动标记out # HELP ceph_osdmap_flag_noout OSDs will not be automatically marked out after the configured interval
ceph_osdmap_flag_norebalance 数据rebalancing暂停 # HELP ceph_osdmap_flag_norebalance Data rebalancing is suspended
ceph_osdmap_flag_norecover Recovery暂停 # HELP ceph_osdmap_flag_norecover Recovery is suspended
ceph_osdmap_flag_noscrub 禁用Scrubbing # HELP ceph_osdmap_flag_noscrub Scrubbing is disabled
ceph_osdmap_flag_notieragent Cache tiering activity已暂停 # HELP ceph_osdmap_flag_notieragent Cache tiering activity is suspended
ceph_osdmap_flag_noup 不允许OSD start # HELP ceph_osdmap_flag_noup OSDs are not allowed to start
ceph_osdmap_flag_pauserd Reads暂停 # HELP ceph_osdmap_flag_pauserd Reads are paused
ceph_osdmap_flag_pausewr Writes暂停 # HELP ceph_osdmap_flag_pausewr Writes are paused
ceph_osds 群集中总OSD数量 # HELP ceph_osds Count of total OSDs in the cluster
ceph_osds_down 处于DOWN状态的OSD数量 # HELP ceph_osds_down Count of OSDs that are in DOWN state
ceph_osds_in 处于IN状态并可用于处理请求的OSD数量 # HELP ceph_osds_in Count of OSDs that are in IN state and available to serve requests
ceph_osds_up 处于UP状态的OSD数量 # HELP ceph_osds_up Count of OSDs that are in UP state
ceph_peering_pgs 群集中peering状态的PG数量 # HELP ceph_peering_pgs No. of peering PGs in the cluster
ceph_pgs_remapped remapped并引起cluster-wide(群集范围)移动的PG数量 # HELP ceph_pgs_remapped No. of PGs that are remapped and incurring cluster-wide movement
ceph_recovering_pgs 群集中recovering状态的PG数量 # HELP ceph_recovering_pgs No. of recovering PGs in the cluster
ceph_recovery_io_bytes 每秒recovery的字节数 # HELP ceph_recovery_io_bytes Rate of bytes being recovered in cluster per second
ceph_recovery_io_keys 每秒恢复的keys数率 # HELP ceph_recovery_io_keys Rate of keys being recovered in cluster per second
ceph_recovery_io_objects 每秒恢复的object的速率 # HELP ceph_recovery_io_objects Rate of objects being recovered in cluster per second
ceph_recovery_wait_pgs 处于recovery_wait状态的PG数 # HELP ceph_recovery_wait_pgs No. of PGs in the cluster with recovery_wait state
ceph_scrubbing_pgs 处于scrubbing状态的PG数 # HELP ceph_scrubbing_pgs No. of scrubbing PGs in the cluster
ceph_slow_requests 慢速请求数 # HELP ceph_slow_requests No. of slow requests
ceph_stale_pgs 处于stale状态的PG数 # HELP ceph_stale_pgs No. of stale PGs in the cluster
ceph_stuck_degraded_pgs 处于degraded状态的PG数 # HELP ceph_stuck_degraded_pgs No. of PGs stuck in a degraded state
ceph_stuck_requests stuck(卡住)的请求数 # HELP ceph_stuck_requests No. of stuck requests
ceph_stuck_stale_pgs 处于stale状态的PG数 # HELP ceph_stuck_stale_pgs No. of stuck stale PGs in the cluster
ceph_stuck_unclean_pgs 处于unclean状态的PG数 # HELP ceph_stuck_unclean_pgs No. of PGs stuck in an unclean state
ceph_stuck_undersized_pgs 处于undersized状态的PG数 # HELP ceph_stuck_undersized_pgs No. of stuck undersized PGs in the cluster
ceph_total_pgs 集群中总的PG数 # HELP ceph_total_pgs Total no. of PGs in the cluster
ceph_unclean_pgs 处于unclean状态的PG数 # HELP ceph_unclean_pgs No. of PGs in an unclean state
ceph_undersized_pgs 处于undersized状态的PG数 # HELP ceph_undersized_pgs No. of undersized PGs in the cluster

LinuxCast视频教程笔记

传统磁盘管理的问题

当分区大小不够用时无法扩展其大小,只能通过添加硬盘、创建新的分区来扩展空间,但是新添加进来的硬盘是作为独立文件系统存在的,原有的文件系统并未得到扩充,上层应用很多时候只能访问一个文件系统。只能让现有磁盘下线,换上新的磁盘之后,再将原始数据导入。

LVM

LVM(Logical volume Manager)逻辑卷管理通过将底层物理硬盘抽象封装起来,以逻辑卷的形式表现给上层系统,逻辑卷的大小可以动态调整,而且不会丢失现有数据。新加入的硬盘也不会改变现有上层的逻辑卷。

作为一种动态磁盘管理机制,逻辑卷技术大大提高了磁盘管理的灵活性。

上图黄色为:VG 橙色为:LV

(1)首先把物理硬盘格式化(存储里面叫条带化)为物理卷(PV),格式化为物理卷的过程实际上是把硬盘空间化成一个一个的PE(PE是逻辑卷空间管理的最基本单位,默认4M)

(2)第二步我们要创建一个VG,VG的作用是用来装PE的,就像一个空间池。我们可以把一个或者多个PV加到VG当中。加入多少个PV,我们的VG容量就是这些PV大小之和。(当创建VG以后,在/dev目录下会多出一个目录)

(3)最后创建LV(每个LV的空间可能来自不同的物理硬盘)

创建LVM

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
[root@teuthology ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 200G 0 disk
├─vda1 253:1 0 1G 0 part /boot
├─vda2 253:2 0 4G 0 part [SWAP]
└─vda3 253:3 0 195G 0 part /
vdb 253:16 0 100G 0 disk
vdc 253:32 0 100G 0 disk

[root@teuthology ~]# pvcreate /dev/vdb /dev/vdc
Physical volume "/dev/vdb" successfully created.
Physical volume "/dev/vdc" successfully created.
[root@teuthology ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb lvm2 --- 100.00g 100.00g
/dev/vdc lvm2 --- 100.00g 100.00g

[root@teuthology ~]# vgcreate linuxcast /dev/vdb /dev/vdc
Volume group "linuxcast" successfully created
[root@teuthology ~]# vgs
VG #PV #LV #SN Attr VSize VFree
linuxcast 2 0 0 wz--n- 199.99g 199.99g

[root@teuthology ~]# lvcreate -n mylv -L 2G linuxcast
Logical volume "mylv" created.
[root@teuthology ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
mylv linuxcast -wi-a----- 2.00g

[root@teuthology ~]# lvcreate -n mynewlv -L 2G linuxcast
Logical volume "mynewlv" created.
[root@teuthology ~]# ll /dev/linuxcast/
total 0
lrwxrwxrwx 1 root root 7 Apr 8 14:59 mylv -> ../dm-0
lrwxrwxrwx 1 root root 7 Apr 8 15:03 mynewlv -> ../dm-1

[root@teuthology ~]# mkfs.ext4 /dev/linuxcast/mylv
[root@teuthology ~]# mount /dev/linuxcast/mylv /mnt/
[root@teuthology ~]# df -Th
Filesystem Type Size Used Avail Use% Mounted on
/dev/vda3 xfs 195G 2.6G 193G 2% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 8.6M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 1014M 172M 843M 17% /boot
tmpfs tmpfs 783M 0 783M 0% /run/user/0
/dev/mapper/linuxcast-mylv ext4 2.0G 6.0M 1.8G 1% /mnt

#删除LVM
[root@teuthology ~]# umount /mnt/
[root@teuthology ~]# lvremove /dev/linuxcast/mylv
Do you really want to remove active logical volume linuxcast/mylv? [y/n]: y
Logical volume "mylv" successfully removed
[root@teuthology ~]# lvremove /dev/linuxcast/mynewlv
Do you really want to remove active logical volume linuxcast/mynewlv? [y/n]: y
Logical volume "mynewlv" successfully removed
[root@teuthology ~]# lvs
[root@teuthology ~]# vgremove linuxcast
Volume group "linuxcast" successfully removed
[root@teuthology ~]# vgs
[root@teuthology ~]# pvremove /dev/vdb
Labels on physical volume "/dev/vdb" successfully wiped.
[root@teuthology ~]# pvremove /dev/vdc
Labels on physical volume "/dev/vdc" successfully wiped.

LVM逻辑卷的拉伸与缩小

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
[root@teuthology ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 200G 0 disk
├─vda1 253:1 0 1G 0 part /boot
├─vda2 253:2 0 4G 0 part [SWAP]
└─vda3 253:3 0 195G 0 part /
vdb 253:16 0 100G 0 disk
vdc 253:32 0 100G 0 disk

[root@teuthology ~]# pvcreate /dev/vdb /dev/vdc
Physical volume "/dev/vdb" successfully created.
Physical volume "/dev/vdc" successfully created.

[root@teuthology ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb lvm2 --- 100.00g 100.00g
/dev/vdc lvm2 --- 100.00g 100.00g

[root@teuthology ~]# vgcreate linuxcast /dev/vdb /dev/vdc
Volume group "linuxcast" successfully created

[root@teuthology ~]# lvcreate -n mylv -L 2G linuxcast
WARNING: ext4 signature detected on /dev/linuxcast/mylv at offset 1080. Wipe it? [y/n]: y
Wiping ext4 signature on /dev/linuxcast/mylv.
Logical volume "mylv" created.

[root@teuthology ~]# mkfs.ext4 /dev/linuxcast/mylv
mke2fs 1.42.9 (28-Dec-2013)
文件系统标签=
OS type: Linux
块大小=4096 (log=2)
分块大小=4096 (log=2)
Stride=0 blocks, Stripe width=0 blocks
131072 inodes, 524288 blocks
26214 blocks (5.00%) reserved for the super user
第一个数据块=0
Maximum filesystem blocks=536870912
16 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912

Allocating group tables: 完成
正在写入inode表: 完成
Creating journal (16384 blocks): 完成
Writing superblocks and filesystem accounting information: 完成

[root@teuthology ~]# mount /dev/linuxcast/mylv /mnt/
[root@teuthology ~]# df -TH
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/vda3 xfs 210G 2.8G 207G 2% /
devtmpfs devtmpfs 4.1G 0 4.1G 0% /dev
tmpfs tmpfs 4.2G 0 4.2G 0% /dev/shm
tmpfs tmpfs 4.2G 9.0M 4.1G 1% /run
tmpfs tmpfs 4.2G 0 4.2G 0% /sys/fs/cgroup
/dev/vda1 xfs 1.1G 180M 884M 17% /boot
tmpfs tmpfs 821M 0 821M 0% /run/user/0
/dev/mapper/linuxcast-mylv ext4 2.1G 6.3M 2.0G 1% /mnt

[root@teuthology ~]# vgs
VG #PV #LV #SN Attr VSize VFree
linuxcast 2 1 0 wz--n- 199.99g 197.99g
[root@teuthology ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
mylv linuxcast -wi-ao---- 2.00g
[root@teuthology ~]# lvextend -L +1G /dev/linuxcast/mylv
Size of logical volume linuxcast/mylv changed from 2.00 GiB (512 extents) to 3.00 GiB (768 extents).
Logical volume linuxcast/mylv successfully resized.
[root@teuthology ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
mylv linuxcast -wi-ao---- 3.00g
[root@teuthology ~]# df -Th
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/vda3 xfs 195G 2.6G 193G 2% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 8.6M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 1014M 172M 843M 17% /boot
tmpfs tmpfs 783M 0 783M 0% /run/user/0
/dev/mapper/linuxcast-mylv ext4 2.0G 6.0M 1.8G 1% /mnt

[root@teuthology ~]# resize2fs /dev/linuxcast/mylv
resize2fs 1.42.9 (28-Dec-2013)
Filesystem at /dev/linuxcast/mylv is mounted on /mnt; on-line resizing required
old_desc_blocks = 1, new_desc_blocks = 1
The filesystem on /dev/linuxcast/mylv is now 786432 blocks long.
[root@teuthology ~]# df -Th
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/vda3 xfs 195G 2.6G 193G 2% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 8.6M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 1014M 172M 843M 17% /boot
tmpfs tmpfs 783M 0 783M 0% /run/user/0
/dev/mapper/linuxcast-mylv ext4 2.9G 6.0M 2.8G 1% /mnt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@teuthology ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 253:0 0 200G 0 disk
├─vda1 253:1 0 1G 0 part /boot
├─vda2 253:2 0 4G 0 part [SWAP]
└─vda3 253:3 0 195G 0 part /
vdb 253:16 0 100G 0 disk
└─linuxcast-mylv 252:0 0 3G 0 lvm /mnt
vdc 253:32 0 100G 0 disk
vdd 253:48 0 100G 0 disk
[root@teuthology ~]# pvcreate /dev/vd
vda vda1 vda2 vda3 vdb vdc vdd
[root@teuthology ~]# pvcreate /dev/vdd
Physical volume "/dev/vdd" successfully created.
[root@teuthology ~]# vgs
VG #PV #LV #SN Attr VSize VFree
linuxcast 2 1 0 wz--n- 199.99g 196.99g
[root@teuthology ~]# vgextend linuxcast /dev/vdd
Volume group "linuxcast" successfully extended
[root@teuthology ~]# vgs
VG #PV #LV #SN Attr VSize VFree
linuxcast 3 1 0 wz--n- <299.99g <296.99g

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
[root@teuthology ~]# umount /mnt/
[root@teuthology ~]# df -Th
文件系统 类型 容量 已用 可用 已用% 挂载点
/dev/vda3 xfs 195G 2.6G 193G 2% /
devtmpfs devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs tmpfs 3.9G 8.6M 3.9G 1% /run
tmpfs tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/vda1 xfs 1014M 172M 843M 17% /boot
tmpfs tmpfs 783M 0 783M 0% /run/user/0
[root@teuthology ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
mylv linuxcast -wi-a----- 3.00g
[root@teuthology ~]# resize2fs /dev/linuxcast/mylv 2G
resize2fs 1.42.9 (28-Dec-2013)
请先运行 'e2fsck -f /dev/linuxcast/mylv'.

[root@teuthology ~]# e2fsck -f /dev/linuxcast/mylv
e2fsck 1.42.9 (28-Dec-2013)
第一步: 检查inode,块,和大小
第二步: 检查目录结构
第3步: 检查目录连接性
Pass 4: Checking reference counts
第5步: 检查簇概要信息
/dev/linuxcast/mylv: 11/196608 files (0.0% non-contiguous), 30268/786432 blocks
[root@teuthology ~]# resize2fs /dev/linuxcast/mylv 2G
resize2fs 1.42.9 (28-Dec-2013)
Resizing the filesystem on /dev/linuxcast/mylv to 524288 (4k) blocks.
The filesystem on /dev/linuxcast/mylv is now 524288 blocks long.

[root@teuthology ~]# lvreduce -L -1G /dev/linuxcast/mylv
WARNING: Reducing active logical volume to 2.00 GiB.
THIS MAY DESTROY YOUR DATA (filesystem etc.)
Do you really want to reduce linuxcast/mylv? [y/n]: y
Size of logical volume linuxcast/mylv changed from 3.00 GiB (768 extents) to 2.00 GiB (512 extents).
Logical volume linuxcast/mylv successfully resized.
[root@teuthology ~]# lvs
LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
mylv linuxcast -wi-a----- 2.00g

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@teuthology ~]# vgs
VG #PV #LV #SN Attr VSize VFree
linuxcast 3 1 0 wz--n- <299.99g <296.99g
[root@teuthology ~]# vgreduce linuxcast /dev/vdd
Removed "/dev/vdd" from volume group "linuxcast"
[root@teuthology ~]# vgs
VG #PV #LV #SN Attr VSize VFree
linuxcast 2 1 0 wz--n- 199.99g 197.99g
[root@teuthology ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb linuxcast lvm2 a-- <100.00g <98.00g
/dev/vdc linuxcast lvm2 a-- <100.00g <100.00g
/dev/vdd lvm2 --- 100.00g 100.00g
[root@teuthology ~]# pvremove /dev/vdd
Labels on physical volume "/dev/vdd" successfully wiped.
[root@teuthology ~]# pvs
PV VG Fmt Attr PSize PFree
/dev/vdb linuxcast lvm2 a-- <100.00g <98.00g
/dev/vdc linuxcast lvm2 a-- <100.00g <100.00g

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
[root@centos7 ltp-install]# ./runltp -h

usage: runltp [ -a EMAIL_TO ] [ -c NUM_PROCS ] [ -C FAILCMDFILE ] [ -T TCONFCMDFILE ]
[ -d TMPDIR ] [ -D NUM_PROCS,NUM_FILES,NUM_BYTES,CLEAN_FLAG ] -e [ -f CMDFILES(,...) ]
[ -g HTMLFILE] [ -i NUM_PROCS ] [ -l LOGFILE ] [ -m NUM_PROCS,CHUNKS,BYTES,HANGUP_FLAG ]
-N -n [ -o OUTPUTFILE ] -p -q -Q [ -r LTPROOT ] [ -s PATTERN ] [ -t DURATION ]
-v [ -w CMDFILEADDR ] [ -x INSTANCES ] [ -b DEVICE ] [-B LTP_DEV_FS_TYPE]
[ -F LOOPS,PERCENTAGE ] [ -z BIG_DEVICE ] [-Z LTP_BIG_DEV_FS_TYPE]

# 将所有报告通过EMAIL方式发送到指定E-mail Address
-a EMAIL_TO EMAIL all your Reports to this E-mail Address
# 在后台额外增加CPU负载的情况下运行LTP
-c NUM_PROCS Run LTP under additional background CPU load
[NUM_PROCS = no. of processes creating the CPU Load by spinning over sqrt()
(Defaults to 1 when value)]
-C FAILCMDFILE Command file with all failed test cases.
-T TCONFCMDFILE Command file with all test cases that are not fully tested.
-d TMPDIR Directory where temporary files will be created.
-D NUM_PROCS,NUM_FILES,NUM_BYTES,CLEAN_FLAG
Run LTP under additional background Load on Secondary Storage (Seperate by comma)
[NUM_PROCS = no. of processes creating Storage Load by spinning over write()]
[NUM_FILES = Write() to these many files (Defaults to 1 when value 0 or undefined)]
[NUM_BYTES = write these many bytes (defaults to 1GB, when value 0 or undefined)]
[CLEAN_FLAG = unlink file to which random data written, when value 1]
-e Prints the date of the current LTP release
-f CMDFILES Execute user defined list of testcases (separate with ',')
-F LOOPS,PERCENTAGE Induce PERCENTAGE Fault in the Kernel Subsystems, and, run each test for LOOPS loop
-g HTMLFILE Create an additional HTML output format
-h Help. Prints all available options.
-i NUM_PROCS Run LTP under additional background Load on IO Bus
[NUM_PROCS = no. of processes creating IO Bus Load by spinning over sync()]
-K DMESG_LOG_DIR
Log Kernel messages generated for each test cases inside this directory
-l LOGFILE Log results of test in a logfile.
-m NUM_PROCS,CHUNKS,BYTES,HANGUP_FLAG
Run LTP under additional background Load on Main memory (Seperate by comma)
[NUM_PROCS = no. of processes creating main Memory Load by spinning over malloc()]
[CHUNKS = malloc these many chunks (default is 1 when value 0 or undefined)]
[BYTES = malloc CHUNKS of BYTES bytes (default is 256MB when value 0 or undefined) ]
[HANGUP_FLAG = hang in a sleep loop after memory allocated, when value 1]
-M CHECK_TYPE
[CHECK_TYPE=1 => Full Memory Leak Check tracing children as well]
[CHECK_TYPE=2 => Thread Concurrency Check tracing children as well]
[CHECK_TYPE=3 => Full Memory Leak & Thread Concurrency Check tracing children as well]
# 运行所有网络测试
-N Run all the networking tests.
# 在后台额外增加网络流量的方式运行LTP
-n Run LTP with network traffic in background.
# 将测试输出重定向到文件
-o OUTPUTFILE Redirect test output to a file.
# 友好可读格式日志文件。
-p Human readable format logfiles.
# 打印较少的详细输出到屏幕。 这意味着不在kernel log中记录测试的开始信息
-q Print less verbose output to screen. This implies not logging start of the test in kernel log.
# 不在kernel log中记录测试开始信息
-Q Don't log start of test in kernel log.
# 安装testsuite的绝对路径
-r LTPROOT Fully qualified path where testsuite is installed.
# 随机化测试顺序
-R Randomize test order.
# 只运行与模式匹配的测试用例
-s PATTERN Only run test cases which match PATTERN.
# 跳过SKIPFILE中指定的测试
-S SKIPFILE Skip tests specified in SKIPFILE
# 在给定的时间内持续执行testsuite。例如:
-t DURATION Execute the testsuite for given duration. Examples:
-t 60s = 60 seconds
-t 45m = 45 minutes
-t 24h = 24 hours
-t 2d = 2 days
# 执行testsuite迭代次数
-I ITERATIONS Execute the testsuite ITERATIONS times.
# 使用wget获取用户的测试用例列表
-w CMDFILEADDR Uses wget to get the user's list of testcases.
# 运行此testsuite的多个实例
-x INSTANCES Run multiple instances of this testsuite.
# 有些测试需要一个unmounted的块设备才能正常运行
-b DEVICE Some tests require an unmounted block device to run correctly.
# 测试块设备的文件系统
-B LTP_DEV_FS_TYPE The file system of test block devices.
# 某些测试需要大型未安装的块设备才能正常运行
-z BIG_DEVICE Some tests require a big unmounted block device to run correctly.
# 大设备的文件系统
-Z LTP_BIG_DEV_FS_TYPE The file system of the big device



example: runltp -c 2 -i 2 -m 2,4,10240,1 -D 2,10,10240,1 -p -q -l /tmp/result-log.3140 -o /tmp/result-output.3140 -C /tmp/result-failed.3140 -d /root/ltp-install